Sprite 1984

home *** CD-ROM | disk | FTP | other *** search

/ Sprite 1984 - 1993 / Sprite 1984 - 1993.iso / admin / bugs / bugs.archive.old < prev next >

Wrap

Text File | 1990-12-11 | 14.9 KB | 379 lines

1. (zorn) Sometimes tonkawa goes into `slow mode' and takes a very long time to respond over rlogin connections. Paul has noticed this and said that it had something to do with tonkawa completely losing its host tables. Paul's solution is to reboot tonkawa. Perhaps Mendel is already aware of the problem. 2. (zorn) When I rlogin to sage, I get the following message: Sprite SPRITE VERSION 1.0 (Brent sun3) (16 Dec 88 15:29:48) Welcome to Sprite *** compat: Cannot decode user status value ffffffff sage 1; 3. (zorn) stty on Sprite doesn't have the rows and columns attributes, which can be used to change how big vi thinks your window is. 5. (zorn) If a program creates a big file and uses up all the disk space on sioux (Sun2 fileserver for SPUR and tonkawa), sioux hangs and even if the process creating the file is deleted, you can't remove the file using up all the space, and the only solution I know is to reboot sioux, tonkawa, and spur. 10. (zorn) I often forget that I've got processes in the DEBUG state and since their executables are still in use, even when I delete the executable (like scl, 3 megabytes worth), the file space isn't reclaimed because a DEBUG process still has a pointer to it. Could you give me a command that will guarantee to kill all my processes in the DEBUG state that I'm not currently debugging? 11. (fred) Missing fonts for TeX. 16. (rab) Excessive mallocs at user level will crash sprite. 17. (jhh) ProcessLine called Fs_NotifyReader passing it nil as a data pointer, causing a bus error. eventNotifyToken is nil for some reason. This happened after the following sequence of events : try to print something on sloth, lpd is started, THEN we plug in the printer. Sloth crashed right after this, which leads me to believe there is a connection here. 18. (mendel) Kernel uses malloc'ed memory after free'ing it. 20. (mendel) Readdir doesn't fix byte order problems properly for spur. Need to add system calls for readdir and statdir. 21. (mgbaker) When I link a kernel in /sprite/src/kernel/mgbaker, the linker (/usr/bin/ld) on the sun4 says "write output error: l.aXXXXXX not found." 22. (ouster) It appears to me that there's a repeatable bug whereby pseudo-devices don't close down correctly. If I start X running, then use L1-K to kill X, I'm left with a bunch of csh processes in RWAIT state (one for every Tx window that was open). I tried to "kill -DEBUG" them to see where they are, but the processes won't enter the debugger. I suspect that this is because they are waiting on their stdin pseudo-device. 23. (douglis) If I rlogin to sprite, start vi, use ^Z to suspend it, and then try to resume, I get thrown back into the shell with my terminal in raw mode and my vi nowhere to be seen. 24. (ouster) The reason vi doesn't know about the window size is because our terminal driver doesn't (yet) support the TIOCSWINSZ and TIOCGWINSZ ioctls. 25. File servers run out of memory. 26. (fwo) I'm getting another NFS problem while moving data to rosemary: NFSPROC_WRITE: RPC: Unable to send; errno = socket is not connected The result is that tar cannot restore the file: tar: Tried to write 4096 bytes to file, could only write -1: my.filename: invalid argument 27. (douglis) stat of /dev/console This doesn't seem to work. I thought it did at first, and I modified loadavg to use this rather than relying on the internal kernel variable. However, it didn't show the time getting updated, and statting /dev/console just showed it not getting a new time. It's said 19:32:05 (read: "18:32:05") for the past couple of minutes. 28. (douglis) bug: suspending a migrated process locks out system That is, if one suspends a pmake running remotely, that remote host is marked forever as unavailable for other migrations. This is just Yet Another Reason for a more sophisticated system with handlers for various events, including signals to migrated processes. In the meantime, I will change the daemon that watches migrated processes and make sure it only believes the "in use" bit on its own machine if there is a foreign *runnable* process. Of course, this means if someone suspends and immediately resumes a remote job, then the host could get flagged as available ang get loaded down by a second job, but thems the breaks. 29. (douglis) behavior on failed page write: We discussed recently how the behavior had been changed to kill processes rather than wait for space to free up. Is this correct? In general, I'd prefer to wait than to have my entire window system die because we run out of swap space! 30. (douglis) The kiss of death: migrate a process onto a machine when the disk space fills and the page-out fails. Paprika lost its window system because a pageout failed when I was starting a make, and at the same time, fenugreek died with a watchdog reset. I now just noticed that several other hosts died at the same time and are not reachable via kmsg, implying they may have hit watchdog resets too. Looks like I have to make that migration code more robust. In the meantime, this is another case for keeping /a less than full! :) 31. (douglis) bug: pmake debug children, define syntax There seems to be a problem with pmake, compared to make, that kept the following construct in local.mk from working: CFLAGS += -DUSERMEM=`cat USERMEM` When I tried to run mkmf on compress, which uses this construct, I hit some error messages followed by an endless loop with the same process being continued and going into the debugger repeatedly: 32. (ouster) Bug: finger/rup database corrupted The finger/rup database seems to have mangled itself over the weekend. For example, "rup" says that oregano is down, but I can rlogin to it and its running Sprite. Also, "finger" says that almost no-one is logged in... although I can't confirm that this is wrong, it looks suspicious. 33. (mgbaker) another funny nfs thing When I'm working in the sprite hierarchy, but on rosemary, and I do a "ci -l" of some files to rcs, weird things happen. I can continue to edit the files on sprite, and everything is happy, but from that point on, none of the changes will be reflected in the same files when I view them from rosemary. The dates don't change and the data doesn't change. Also, the number of links for them, listed by "ls", is 0. It should be 1. (This is all easily curable by removing the files from rosemary and rewriting them again on sprite.) 34. (ouster) Bug: ipServer crash My ipServer died shortly after the Mint crash today (but I suspect that the two are only marginally-related). 35. (douglis) things would be much easier if we had a unix-ish syslog approach, in which syslog messages are stored in a regular file and cycled through on a daily or weekly basis to keep from storing old messages too long. With syslog going only to the console, or some other process reading it, there's no way for someone else to see the messages. 36. (jhh) ditroff problem? I get the following message when I try to print a man page. ditroff -Pcad -man fstrash.man troff: Can't open /sprite/lib/ditroff/devpsc/c.out There is a file /sprite/lib/ditroff/devpsc/C.out did something change recently? 37. (brent) pseudo-device access/modify times Garth pointed out an interesting behavior of the modify time of a tx window. Go to an idle tx window and do an ls -l of its corresponding tx pseudo-device. The modify time won't reflect your typing of the ls command. If you repeat the ls then the modify time will be current, reflecting the generation of the prompt for the second ls. If you do ls -lu to get the access time, it will be current. Now, as a final twist, if you use the stat program you always get the correct dates. That is (apparently) because ls uses stat(filename), while stat uses fstat(openFile). Frankly, that these are different is still a suprise to me. I'll mull this one over, but may not change anything. 38. (mendel) fstat() doesn't get correct modify time If a process has a file open and another process modifies the file the modify time as return by the fstat() library routine is not updated. The stat() routine does return the correct modify time. This is why the unfsd has been working so poorly on sprite. 39. (gibson) looks to me that whenever I invoke enscript with more than one file on the command line, it says it can't open the second file and dies. 40. (ouster) RPC/sendmail messages I just noticed the following messages in my syslog window: RpcScatter: rpc 7 param size + off (4 + 0) > (0) <19>Jan 26 11:22:30 sendmail[91b3d]: tung@ibm.com... reply: read error <19>Jan 26 11:29:03 sendmail[41b3b]: mogul@decwrl.dec.com reply: read error Does anyone know if any of these messages is cause for concern? Do the sendmail messages mean that mail was lost? 41. (brent) bug: var val When Mike implemented the system call for vmcmd he used a macro trick that doesn't work with gcc, so he generates printfs when you change VM constants that say "var val is 1200", etc. There are two bugs. The first is that the kernel shouldn't generate these print statements. The user program that changes the value should. The second bug is the "var val" macro bug, which won't need to be fixed if the system call is changed to return the old variable value. The Fs_Command system call is set up to return the old value of a variable being changed, but the Vm_Command system call is not. Perhaps a third bug is that these two system calls are distinct, and there is even another cesspool-call, er, system call, called Sys_Stat. 42. (mgbaker) mkmf becoming a pain Is there a way to ask mkmf only to redo things for a particular machine? It does not scale well as we increase the number of machines we have ported to. It can take several minutes to add one file to one machine's subdirectory. 42. (hilfingr) File rot on tonkawa still FYI: The file rot problem on tonkawa is still with us. It struck within the last few days (I believe). 43. (jhh) Xsprite bug Xsprite went into the debugger on me. I poked around a bit and found it had segmentation violation in the function PdevReadClient, file os/pdev.c line 825. I don't know what a real client structure is supposed to look like, but a streamID > 327000 looks bad to me. I think Dispatch somehow tried to do a read on a nonexistent client. 44. (Gibson) cp -p problem I tried cp -p /nfs_file_1 /nfs_file_2 but /nfs_file_1 was read_only to me so the result was a /nfs_file_2 file of zero length and a permission denied error message it seems that the read only protection is done first but then the copy writes fail i checked on UNIX and this does not happen (different cp command?) 45. (gibson) nfs and symbolic links running under sprite, i make a symbolic link through nfs to a unix filesystem - but i can't read it or follow it the problem is different symbolic link formats under unix and sprite sprite adds a null to the end of the link file (according to brent) so unix is unhappy when nfs tells it to access the link brent knows about this and may do something about it There are actually two problems. The first is the one about different link formats. I'll be changing Sprite servers to be UNIX-like sometime soon. The other problem was that you couldn't readlink() a symbolic link via nfsmount, regarless of who created the link. I've fixed this bug. brent 46. (mgbaker) Vm tracing bug? In going over the assembly routines for the virtual memory module, it appears to me that the VmMachTracePMEG routine is buggy on the sun2's and 3's. It should trace a page if the page is resident and also is either referenced or modified. But it traces it in some cases even if it isn't resident. So unless the modified and referenced bits are cleared when the page isn't resident, this routine will give wrong numbers. Is there anyone out there who depended on this routine? If so, I'll check how the referenced and modified bits are cleared to see if things were okay or not... 47. (douglis) bug: recovery consistency crashed network We had to reboot the world because mint got itself a bit confused. I think Brent was aware of this when he tried to continue it earlier after "client N not last writer M" messages kept coming out (each one hitting a breakpoint). The problem is, after mint rebooted it hit the same error, each time with a different client. My brute-force solution was to reboot the clients so they wouldn't try to force anything on mint that it couldn't handle. Brent, the lineprinter output in the machine room contains some other interesting messages about stale files & such that I hadn't seen before. Maybe they're relevant. 48. (douglis) bug: scsi tape error The thing we hit last time, and which we still hit, is as follows: SCSITape, 10 sense bytes => Unknown sense size DevSCSITapeError: Unknown tape drive typeSCSI-0: Sense error (7-0) at <600> this is kernel (JohnH sun2) 1/15/89 49. (douglis) bug: lpd doesn't compile I did a mkmf so I could install a sun2 version and not use a year-old version on the sun2s at boottime, with named pipes. I hit a compiler error in lpd.c, both for sun2s and then again for sun3s. 50. (jhh) migration bug I tried to kill a migrated process (a ps on the process gave me one of those "couldn't read segment info..." messages) and it put thyme into the debugger. The process control block was garbage. 51. (mendel) sendmail bug I can't send mail from my Sprite account. Sendmail goes into a loop forking itself and exiting. 52. (jhh) bug When I try to rcp my kernel to ginger I get the following error: Pdev_Write, no return amtWritten (/hosts/sage.Berkeley.EDU/netTCP) RequestResponse request too large If I do the rcp from ginger everything works. 53. (douglis) fs/migration bug I was running a compile, which hung. I noticed in my syslog: Fs_RpcWrite, stale handle <0,192> client 5 2/16/89 10:56:32 basil (5) starting recovery This recovery never finished. Turns out in addition, I had only one process in the migrated state but at least one process that was running on basil but on paprika was in the NEW state. Signalling the NEW process had no effect, and signalling the process using its id on basil caused the rpc to hang.